Skip to content

Conversation

@brokkoli71
Copy link
Member

For some specific shapes of chunks_per_shard (e.g. [5,2]), the current implementation of morton code will produce indices that are not within the shape. Therefore reading the sharding index will drop some chunks.
I was wondering if using morton code is a significant efficiency improvement over normal indexing like np.unravel_index

@d-v-b
Copy link
Contributor

d-v-b commented Oct 16, 2024

For some specific shapes of chunks_per_shard (e.g. [5,2]), the current implementation of morton code will produce indices that are not within the shape.

Is this a bug, or something fundamental to morton coding?

@brokkoli71
Copy link
Member Author

brokkoli71 commented Oct 16, 2024

@d-v-b
As far as I understand, morton coding should theoretically be able to traverse arrays of any shape. So I think it's a bug in our implementation.

I wonder if it's worth investigating further how to fix it, or if we just use a simple iteration algorithm like np.unravel_index.

@d-v-b
Copy link
Contributor

d-v-b commented Oct 16, 2024

@d-v-b As far as I understand, morton coding should theoretically be able to traverse arrays of any shape. So I think it's a bug in our implementation.

I wonder if it's worth investigating further how to fix it, or if we just use a simple iteration algorithm like np.unravel_index.

I thought the use of the morton ordering was motivated by the desire to place spatially contiguous chunks close together in a shard. The simpler linearization algorithms will not allow this. Maybe it's better to fix the bug in the morton encoding?

@brokkoli71 brokkoli71 marked this pull request as ready for review October 22, 2024 09:57
@brokkoli71
Copy link
Member Author

@d-v-b
The morton encoding was actually right.
The problem was that morton ordering by is designed to fill arrays with power-of-two shapes. Therefore, it would sometimes return a coordinate outside the array. I implemented that these coordinates are skipped so that most of the time the morton ordering is maintained and only at the edge a few jumps can occur.

@normanrz normanrz self-requested a review November 29, 2024 15:38
@normanrz normanrz merged commit cdd6a74 into zarr-developers:main Nov 29, 2024
26 checks passed
@normanrz normanrz deleted the fix-iterating-over-sharding-index branch November 29, 2024 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants